Travel Demand Trends to Tech Hubs (EDA and Hypothesis testing)

Joyeuse & Grace & Natacha

Data Analysts

2025-06-30

Introduction

In this notebook, we perform an exploratory and descriptive analysis of the Travel Demand Dataset, using flight data accessed via the Amadeus API. The project aims to uncover travel trends to key global tech hubs—San Francisco, London, Bangalore, Singapore, and Tel Aviv—with the goal of informing strategic decisions for travel brands.

This analysis focuses on key variables such as flight price, number of stops, travel time, and available seats. Through data visualization and summary statistics, we explore underlying patterns and relationships in the data. This foundational work is critical for identifying trends, spotting potential biases, and guiding future modeling or policy-focused analysis.

Objectives

  • Analyze flight demand and pricing trends across major tech cities.
  • Understand how attributes like destination, stop type, and travel time influence flight prices.
  • Use statistical tests to evaluate whether differences across groups (e.g., destinations or stop types) are significant.
  • Provide actionable insights and recommendations to help travel brands make data-informed decisions.

Exploratory Data Analysis (EDA)

We begin our analysis by importing core Python libraries including pandas, numpy, seaborn, matplotlib.pyplot, plotly.express, and os. These tools support efficient data manipulation, numerical analysis, visualization, and directory management. Warnings are suppressed to ensure a cleaner output during rendering.

To maintain a reproducible workflow, we define directory paths for storing raw data, processed files, results, and documentation. The cleaned dataset is loaded from the processed directory into a Pandas DataFrame for analysis.

Dataset Overview

The dataset contains 7,870 rows and 8 columns, covering both numerical and categorical flight-related variables such as:

  • Price (USD)
  • Number of Stops
  • Available Seats
  • Travel Time
  • Destination
  • Airline
  • Departure Date

Preprocessing includes converting the Departure Date column to datetime format and cleaning the Price (USD) field to numeric.

Summary Statistics

  • Numerical variables reveal a wide range in price (from under $80 to over $17,000), and most flights have 1 stop with 7–9 seats available.
  • Categorical variables show 5 unique destinations and 53 different airlines, allowing for rich segmentation and comparison.
  • Travel Time is stored in timedelta format and analyzed in hours for correlation and visualization purposes.

This EDA phase sets the foundation for statistical testing by uncovering distributions, group differences, and potential relationships within the dataset.

Key Insights from Visualizations

Average Flight Prices to Tech Hubs

This bar chart shows the average flight price to each tech hub.

  • Insight: Tel Aviv and Singapore have the highest average prices, while San Francisco and London are more affordable.
  • Implication: Dynamic pricing strategies should consider geographic pricing variation to stay competitive.

Flight Price vs. Travel Time

A scatterplot showing the relationship between travel time and price across destinations.

  • Insight: Longer flights generally cost more, though the pattern varies by city.
  • Implication: Duration-based pricing could optimize revenue and better reflect flight complexity.

Flight Stop Distribution per Destination

A grouped bar chart comparing number of stops per destination.

  • Insight: Bangalore and Tel Aviv often require 2+ stops, while London and San Francisco have more nonstop options.
  • Implication: Destinations with fewer nonstop flights may be less appealing to time-sensitive travelers and could benefit from improved routing or pricing incentives.

Average Flight Prices Over Time

A line plot showing average daily prices per destination.

  • Insight: Price fluctuations vary across cities, with some showing consistent price increases or dips.
  • Implication: Pricing teams can use these trends to forecast demand and run targeted promotions during dips or peaks.

Travel Time Distribution by Destination

Boxplot illustrating the spread and median of travel durations per city.

  • Insight: Singapore and Bangalore have longer and more variable travel times. London and San Francisco have shorter, more stable durations.
  • Implication: Consider offering enhanced services or bundles for longer routes to improve traveler satisfaction.

Insights from Hypothesis Testing

  • Flight prices significantly vary across destinations.
    → Some tech hubs (like Singapore or Tel Aviv) consistently have higher prices, suggesting pricing is influenced by distance, demand, or airline competition.

  • Flight prices also differ by number of stops.
    → Nonstop flights tend to be more expensive, while multi-stop flights are more affordable but may offer lower convenience.

  • There is a positive correlation between travel time and price.
    → Longer travel durations are moderately associated with higher prices, suggesting airlines adjust pricing based on route length.

Recommendations

  • Set different prices for each destination.
    Some cities are much more expensive to fly to—adjust prices to match demand and travel costs.

  • Price flights based on number of stops.
    Nonstop flights are more convenient and can be priced higher, while multi-stop flights can attract budget travelers.

  • Consider travel time in pricing.
    Longer flights often cost more. Make sure the price reflects the duration, and offer better services on long routes.

Conclusion

There are clear differences in flight prices depending on where the flight goes, how many stops it has, and how long it takes. These insights show that pricing should not be the same for every route. By using this data, airlines and travel platforms can set smarter prices, meet traveler needs better, and increase profits.